42 research outputs found

    Investigation of Ethernet switches behavior in presence of contending flows at very high-speed

    Get PDF
    This paper examines the interactions between layer 2 (Ethernet) switches and TCP in high bandwidth delay product networks. First, the behavior of a range of Ethernet switches when two long lived connections compete for the same output port is investigated. Then, the report explores the impact of these behaviors on TCP protocol in long and fast networks (LFNs). Several conditions in which scheduling mechanisms introduce heavy unfair bandwidth sharing and loss burst which impact TCP performance are shown

    A study of large flow interactions in high-speed shared networks with Grid5000 and GtrcNET-10 instruments

    Get PDF
    We consider the problem of huge data transfers and bandwidth sharing in contexts where transfer delay bounds are required. This report investigates large flow interactions in a real very high-speed network and aims at contributing to high-speed TCP variants evaluation by providing precise measurements. It then also gives an insight on the behaviour of emulated alternative protocols under different realistic congestion and long latency conditions in 10~Gbps experimental environments

    Imbalance of CPU temperatures in a blade system and its impact for power consumption of fans

    Get PDF
    We are now developing a new metric of data center power efficiency to fairly evaluate the contribution of each improvement for power efficiency. In order to develop it, we built a testbed of a data center and measured power consumption of each components and environmental variables in some detail, including the power consumption and temperature of each node, rack and air conditioning unit, as well as load on the CPU, Disk I/O and the network. In these measurements we found that there was a significant imbalance of CPU temperatures that caused an imbalance in the power consumption of fans. We clarified the relationship between CPU load and fan speed, and showed that scheduling or rearrangement of nodes could reduce the power consumption of fans. We reduced fan power consumption by a maximum of 62% and total power consumption by a maximum of 12% by changing the scheduling of five nodes, changing the nodes used from hot nodes to cool nodes

    At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache

    Full text link
    Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of LARC, a processor fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a board set of proxy-applications and benchmarks, we aim to reveal where HPC CPU performance could be circa 2028, and conclude an average boost of 9.77x for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design

    Large Scale Gigabit Emulated Testbed for Grid Transport Evaluation

    Get PDF
    International audienceEvaluating the performance of Grid applications running on high performance platforms interconnected by high speed and long distance net- works with new transport services and protocols is highly required. This paper presents the eWAN inte- grated environment enabling large scale grid emula- tion at gigabit speed. It discusses features provided to control key characteristics (topology, round trip time, packet size, drop rate, link capacity) of an evaluation scenario. A method to increase the accuracy of rate control under various delay configuration is proposed and some experimental results are detailled

    Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing

    Get PDF
    International audienceOn the work sharing among GPUs and CPU cores on GPU equipped clusters, it is a critical issue to keep load balance among these heterogeneous computing resources. We have been developing a runtime system for this problem on PGAS language named XcalableMP- dev/StarPU [1]. Through the development, we found the necessity of adaptive load balancing for GPU/CPU work sharing to achieve the best performance for various application codes. In this paper, we enhance our language system XcalableMP-dev/StarPU to add a new feature which can control the task size to be assigned to these heterogeneous resources dynamically during application execution. As a result of performance evaluation on several benchmarks, we confirmed the proposed feature correctly works and the performance with heterogeneous work sharing provides up to about 40% higher performance than GPU-only utilization even for relatively small size of problems
    corecore